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We have performed parallel tempering Monte Carlo simulations using a simple continuum heteropolymer 
model for proteins. All ten heteropolymer sequences which we have studied have shown first-order transitions 
at low temperature to ordered states dominated by single chain conformations. These results are in contrast 
with the theoretical predictions of the random energy model for heteropolymers, from which we would expect 
continuous transitions to glassy behavior at low temperatures. 

I. INTRODUCTION 

Many biological heteropolymers (in particular proteins) can adopt a single, sequence dependent conformation (the fold, or na- 
tive state) under physiological conditions. This specificity allows groups of atoms within such a molecule to be placed accurately 
in space, allowing the vast range of precise biochemical functions within living organisms. Upon altering the environment around 
the protein (e.g. pH, solvent quality, temperature, etc.), the molecule can be made to lose this single conformation (i.e. unfold or 
denature), adoptinga fluctuating, disordered state. This, folding transition has been shown to be both reversible and first-order 
for many proteins d], and is still poorly understood. 

The modern theoretical viewpoint on the thermodynamics of protein folding (recently reviewed by Pande et al. f5]) uses 
concepts from the theory of spin glasses. This approach was introduced by Bryngelson and Wolynes 1 3], who used the random- 
energy model (REM) |4] to describe the low temperature behavior of heteropolymers. This simple model assumes the energies 
of the possible conformations of a heteropolymer to be independent and uncorrected, and predicts a second order transition (no 
latent heat) to a glassy state dominated by a low energy conformation. The more complicated theory of Shakhnovitch et al. | 3 
uses another trick from the spin glass repertoire, the replica approach, to arrive at the same conclusions. 

The low energy conformation occupied in the glassy state is not necessarily the native state. Depending upon details of 
conditions, sequence and sample history, the heteropolymer may well become "misfolded"; that is, locked in to a non-native, 
low energy conformation. Indeed, most randomly selected sequences should have several closely-spaced conformations at the 
low end of their energy spectrum, and will not reliably or reproducibly fold into only one of these. Some annealing of the 
heteropolymer sequence (i.e. design work) should be needed to "drive down" the energy of a chosen (target) conformation well 
below the closely-spaced region of the energy spectrum, preventing sampling of competing conformations and allowing robust 
folding |2]. Simulations |2, 6] and numerical studies |7] using lattice heteropolymer models support these predictions and the 
approximations behind the models. 

However, the random energy approximation is essentially mean field in nature, and cannot accurately treat systems with per- 
sistent correlations fS]. Further, lattice models artificially constrain the configurations available to a polymer. While lattice 
homopolymer results have been instructive at high temperatures |9, 10], they do not predict the low temperature freezing tran- 
sition seen in the simulations of Zhou et al. LI L.121 . where an isolated homopolymer freezes into a single, apparently unique 
crystalline state via a first-order transition. 

We have performed Monte Carlo simulations using ten random sequence realizations of a simple off-lattice heteropolymer 
model to test these predictions in continuum systems. We report that, rather than exhibiting glassy behavior at low temperature, 
all ten sequences studied pass through a first-order transition to an ordered low energy state, dominated by a single conformation, 
without need for sequence design work. 

II. MODEL 

Our heteropolymers are modeled as freely-jointed chains of hard-sphere monomers of diameter a, each assigned a type; 
we denote the type of monomer / by s,. The model is a simple extension of that used by Zhou et al. and is shown 



schematically in Fig.^ Bonded monomers interact via the potential 
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The parameter 5 is a bond-length flexibility parameter; this allows for conversion to molecular dynamics simulation and has 
been shown to make no phenomenological difference to homopolymer folding results fT]l[l2il . We use 5 = 0. 1, allowing a ten 
percent variation in bond length. 

If monomers / and j are not directly bonded (that is, \i — j\ > 1), then they interact via a square-well potential: 



{°° r < (J 

£s,sj a < r < Xa (2) 
Xa < r 

where r denotes the distance between monomers / and j, Act is the square well width, and e^^/ is an interaction matrix giving 
square-well interaction^trengths (square-well depths) between the monomer types. In this work, we use a modified hydrophobic- 
polar, or mHP matrix ll36ll . This is given by: 



where e' is the characteristic energy scale. With this model, monomers with = are hydrophobic-like (H-type), preferring type- 
symmetric interactions, while type Sj = 1 monomers are polar-hke (P-type), preferring type-asymmetric interactions. Similar 
interaction matrices are widely used in on-lattice protein folding simulations and numerical work fisl fl4l fTsIl and are believed 
to be good approximations to real intraprotein interactions |16]. In the dense, globular heteropolymer state, we expect this 
interaction matrix to lead to configurations with a core of hydrophobic monomers surrounded by polar monomers, as seen in 
globular proteins. 

We use reduced units scaled by the hard-sphere radius a and the energy scale e', such that 



r* = rja 

E* = E/e' (4) 
T* = keT/e' 

Polypeptides of length greater than approximately 30 amino acids can form stable folded domains (e.g. the 35 amino acid WW 
domain lYW ). We have chosen to study = 64 length heteropolymers; this length is short enough for simulation in reasonable 
time, while long enough that equivalent real proteins can fold. We have studied ten random sequences interacting via the mHP 
interaction matrix, each with a 1 : 1 H:P composition ratio. If we consider P-type monomers to map onto charged and polar 
amino acid residues, and H-type monomers to map to all other amino acid residues, this composition ratio is similar to that 
found in real proteins fisil . 



III. SIMULATION METHOD 



Monte Carlo simulations were performed for a single polymer in isolation. As such, we have not used periodic boundary 
conditions. We have implemented crankshaft, pivot and continuum configurational bias (CCB) polymer moves 1 19]. Bond length 
fluctuation is allowed both in the CCB move implementation, as well as through standard Monte Carlo particle displacement 
moves performed on the monomers. CCB moves regrow end segments of up to four monomers. Moves are performed with 
relative probabilities in the ratio A^: (1/2): (1/4): (1/4) for particle displacement, crankshaft, CCB and pivot moves respectively. 
Initial equilibration runs last 3 x 10^ Monte Carlo move attempts; data collection runs last 1 x 10^ move attempts. 

The Monte Carlo simulations also use parallel tempering (also known as replica exchange) fl^. Several Monte Carlo sim- 
ulations of the same system at differing temperatures are run in parallel; these simulations attempt to exchange configurations 
at regular intervals. This helps to maintain the ergodicity of the simulations; an individual system which has become stuck 
at low temperature in a metastable "trap" may be promoted to higher temperatures, where it can explore a wider range of 
phase space, aiding its equilibration. Our parallel tempering setup uses ten individual simulations across the temperature range 



T* — 0.15 ... 1.5, distributed such that the ratio between consecutive temperatures is constant 112 IB . Configuration swap moves 
are attempted every 1000 Monte Carlo move attempts. 

The simulations sample the configurational energy E, the radius of gyration r^, the two-particle density p(^'(r), and the 
instantaneous contact map c for the system every 10^ Monte Carlo move attempts. The two-particle density is defined as 



^'''W = ^(IL5M,)^ (5) 



where r,^ is the distance between monomers ; and j ^122 

The instantaneous contact map c reflects a heteropolymer conformation; it is an x matrix, with each element Cjj equal 
to one if monomers ; and j are within each others square well, and zero otherwise. The instantaneous contact map is therefore 
given by: 

rij<Xa 

1 nj>la 

The average contact map ^c,,) gives us information on the structure adopted by the system at given temperature. We can use 
it to calculate a normalized thermal average overlap: 

This normalized overlap parameter can take values between zero and one. Values close to one indicate that the system is 
dominated by only a very small number of very similar conformations; small values indicate that the system is exploring a 
variety of different conformations. 

Energy and radius of gyration data from the individual simulations within a parallel tempering run are tied together using self 
consistent multiple histogram extrapolation |23]. This allows the histograms of energy E and other observables {A} recorded 
from the individual simulations to be combined to estimate i2(£', {A}), the density of states of the system. From this, we can 
interpolate the behavior of the system across, and to a certain extent beyond, the simulated temperature and parameter range. 
We use the extrapolated distribution at given temperature to calculate the constant volume specific heat c*, given by 

{E*^)-lE*f 

Histogram extrapolation methods rely on good sampling across the range of interest by the individual simulations. As such, 
simulations around first-order transitions with a significant free energy barrier between phases can be problematic. If simulations 
on either side of the transition temperature sample only the phase which is most stable at that temperature, but no simulation 
actually observes the system passing back and forth between phases, the extrapolation process can "blur out" the transition, 
leaving a signature only in an increase in the error estimate and the response functions (e.g. specific heat) across the region of 
the transition. Worse still, if the system exhibits strong metastability, this can introduce a large systematic error into any estimate 
for the position of the transition. 

To avoid this problem, we have used multicanonical sampling ll24ll to directly confirm or deny phase coexistence where we 
suspect the existence of a first-order transition. We have used an iterative multicanonical scheme fl^. Repeated biased Monte 
Carlo simulations refine our estimate for the density of states of the system, with the aim of sampling evenly in energy across 
the suspected range of coexistence. The bias can then be removed from the results of these simulations, and the refined estimate 
for the density of states can be spliced back into the wider simulation results. The individual simulations used in this procedure 
are performed using the data collection run parameters stated above. 



IV. SIMULATION RESULTS 



Plots of E*, c* and r* for our sequences, interpolated across the studied temperature range, are shown in Fig. |2l The phe- 
nomenology does not vary considerably with sequence. Above a temperature of T* k, 0.9, the heteropolymer exists in a dis- 
ordered coil state - a high energy structure that rapidly increases its radius of gyration and decreases its heat capacity (energy 



fluctuations) with temperature. Between T* « 0.3 and T* w 0.9, we see the expected low temperature coflapsed globule state 
with a small radius of gyration (order of 0.05(7) and with relatively high heat capacity (order of LOks). Around T* « 0.26, 
however, we see sharp spikes in the specific heat, associated with sudden drops in both conformational energy and radius of 
gyration with decreasing temperature, suggestive of first-order transitions. We have used multicanonical sampling to study these 
apparent transitions; for all sequences, we have found temperatures at which double peaked probability distributions occur for 
conformational energy, indicating two-phase coexistence. These probability distributions are shown in Fig.|3la). The sequence 
dependence of the probability distributions is small. 

In Fig. |3lwe plot coexistence temperatures for the sequences studied against the energy difference AE* /T* between the 
coexisting phases. The quantities are clustered around T* « 0.27 and AE/ksT « 56, but otherwise appear uncorrected. For 
comparison, the typical latent heat AH of the folding transition for a globular protein is of order of lOOksT tz^ . 

A. Structural character of the low energy phase 

We present two-particle densities p'^' data, calculated using Eq. (|5|l, for sequence 4 in Fig.|3 these plots are typical of the 
results for all ten sequences studied. These data were collected before the multicanonical algorithm was used, and may exhibit 

some degree of metastability. We first consider p^j^ (r*) ,the hydrophobic-hydrophobic two-particle densities, in Fig.^Ja). We 
see that below the transition temperature, the first three neighbor shells are clear and distinct, separated by regions with near- 
zero hydrophobic particle density. The next three neighbor shells are much more sparsely populated (limited by the diameter 
of the collapsed state), but remain distinct, separated by sharp minima. These data show a low energy phase structure with 

(2) 

local translational order between hydrophobic monomers. For (r*) above the transition, the first three neighbor shells are 
still clear up to T* — 0.7, but not distinctly separated. The next three neighbor shells have blurred together into one single 
peak. Above the transition temperature, the hydrophobic monomers have no translational order, as expected in a disordered 
globular phase. We show typical low-energy phase and globular phase configurations in Fig. |5j the orientation of the low-energy 
configuration has been chosen to clearly show the hydrophobic monomers arranged in hexagonal crystal planes, whereas no 
crystal structure exists in the globular configuration. 

FiguresUb) and (c) show hydrophobic -polar and polar-polar two-particle densities p^j, (r*) and p^p (r*), respectively. Nei- 
ther show any sign of translational order, either above or below the transition. The polar monomers remain in a disordered state 
across the temperature range studied. 

We show the thermal average contact maps in Fig.|6l Note that these data are collected from equilibrated parallel tempering 
runs before the multicanonical algorithm is used, and as such exhibit metastability. At low temperatures (below T* w 0.32), we 
see that the contact maps are dominated by very persistent contacts, suggesting that the low temperature structure is dominated 
by very few configurations. Contact maps collected at higher temperatures show large low persistence regions, as expected for a 
disordered globular state. 

To quantify this, plots of the normalized overlap parameter, calculated from these contact maps, are shown in Fig.0 As with 
the contact maps, these results exhibit metastability around the transition. For all sequences, we see a large increase on cooling 
in (=S„) from values of (=S„) w 0.3 in the disordered globule phase to values of (=S„) w 0.85 in the low energy phase. This 
confirms that the low energy phase is dominated by small fluctuations around a single conformation. 

V. DISCUSSION AND CONCLUSIONS 

Our results indicate that, for all of the ten random heteropolymer sequences which we have studied, there exists a low temper- 
ature transition between the familiar collapsed liquid-like globule phase and a low energy phase. These transitions do not have 
the glassy character expected from replica theory, showing instead double-peaked energy distribution functions with substantial 
associated latent heat. In each case, the low energy phase is more compact than the disordered globule, having a lower rg, 
exhibits local translational order and crystal planes, as shown by the two-particle density function p (similar to the low energy 
phases seen in off-lattice homopolymer simulations fTll [l2il '). and is not glassy, being dominated by a single conformation of the 
polymer backbone, as shown by the normalized thermal average overlap parameter (=S„}. In analogy to the more familiar phase 
behavior of bulk systems, we describe the transition as a freezing transition, and the low-energy state as a frozen state. We note 
the obvious parallels with protein folding transitions, where polypeptide heteropolymers undergo a first-order phase transition 
to a collapsed single conformation, exhibiting repeating structural units. 

The large computational effort associated with simulating high-density, low temperature continuum polymer states has limited 
the number of sequences we have been able to study. Nevertheless, we can use the Newcombe-Wilson score method (method 
3 in Ref. iIztIi ') to estimate with 95% confidence that at least 72.25% of 64 length 1:1 HP composition ratio sequences will 
demonstrate an equivalent first-order freezing transition to a non-glassy single conformation state. 



We suggest that the reason that we see different behavior from equivalent lattice models, which undergo low temperature glass 
transitions, is a matter of symmetries. As evidenced by the p*^^' data (see Fig.|4j, the disordered globular state for our sequences 
has no translational order, with hydrophobic monomers able to occupy any position, within the bounds of bond lengths and 
hard-sphere overlap. Passing below the freezing temperature, the frozen states of our sequences gain crystalline-like order |37j; 
hydrophobic monomers must occupy lattice points, or points very near them. The spatial symmetry of the system has been 
broken in a manner which, in three dimensions, requires a first-order phase transition. In the lattice model, the approximation is 
that spatial symmetry is already broken for the system at all temperatures, and there is no possibility of such an order-disorder 
transition. Further, the symmetry of the lattice is fixed in advance; if the frozen state of the off-lattice system is on a different 
lattice, or if some distortion of the lattice is necessary for stability, the lattice model cannot accurately portray the ordered frozen 
state. With cooling, the lattice system will remain in a disordered state, eventually forming a glass. 

The crystalline nature of the frozen states seen here do not resemble the ordered structures seen in biological heteropolymers 
(i.e., proteins); we see no signature of large scale helices or j3-sheets in the contact maps. We believe this to be due to the 
complete flexibility of the chains in the model. It has been suggested l'28'l that helices and j3 -sheet like "saddle" structures are 
optimal shapes for a stiff stringlike object, in the same manner that the ubiquitous face centered cubic crystal structure is the 
optimal arrangement for sets of spheres. We are currently investigating this matter for both homo- and heteropolymeric systems. 

It can be seen from the contact maps (see Fig.|6j that the frozen configurations are strongly sequence dependent. However, the 
thermodynamic behavior of the heteropolymers (Fig.|2} has very little (though still statistically significant) sequence dependence. 
We believe that the thermodynamic behavior, in the HP model at least, has stronger composition dependence than sequence 
dependence. Our 1:1 H:P composition ratio (with its freezing temperatures bunched around T* k, 0.27) lies between the limits 
of the pure hydrophobic homopolymer (with a freezing temperature of T* = 0.33 1 11 1), and the pure polar homopolymer with 
its effectively hard-sphere monomers, which should not freeze in isolation. 

It is generally suggested that the fraction of heteropolymeric sequences capable of freezing to unique conformations should 
be small, around 10^^ ll29il . Dill and coworkers estimate the chance of finding a protein sequence which will fold to a given 
approximate structure as being of order 10^'" 1 15, 30, 31]. Recent experimental work, using a large random-sequence protein 
library to find sequences which bind to ATP, estimates the fraction of protein sequences which are "functional" as around 10^' ' 
ll32ll . For our system, we have estimated that over 70% of 64-length 1:1 composition ratio HP sequences freeze to unique 
conformations. Similar over-estimation is also seen for lattice models; the impressive direct enumeration performed by Li 
et al. 1 14] on an HP lattice model suggests that as many as 4.75% of all 27-length HP sequences have unique ground state 
configurations. 

We believe such over-estimation to be due to a lack of sources of frustration in our model, as compared to real protein systems. 
Our model has only two monomer types, no size polydispersity or side chains, completely flexible bonds, and no orientational 
dependence in its interaction potential. Any of these factors could significantly lower the number of model sequences which 
exhibit folding-like behavior 

In summary, we have demonstrated that a simple continuum heteropolymer model can exhibit a first-order transition to an 
ordered low temperature state dominated by a single conformation, without glassy behavior. This behavior is similar to that seen 
in protein folding. It would be of interest to see whether extensions to this model yield further protein-hke character, for example 
secondary structure, or instead introduce frustrations which prevent the observed behaviors. 
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Figures 



Figure 1: Schematic drawing of molecular model; a system of spherical monomers (filled circles), diameter a. Monomers are joined together 
in a non-branching linear sequence with bonds of length (t(1 — 5)</<c7(l-l-5). Non-bonded monomers interact if they are within distance 
Act of each other (shown with dotted circles); in the figure, monomers / + 1 and (' 4- 2 are interacting. 




Figure 2: (a) Energy per unit monomer, (b) heat capacity and (c) radius of gyration as a function of temperature for tlie ten sequences studied. 
The scale of plot (c) has been chosen to show the low temperature behavior; the inset shows the rg behavior over the full temperature range 
studied. Curves are calculated at T* intervals of 0.01 using Q. (£, rg) calculated from parallel tempering runs as described in the text. Estimated 
errors are smaller than symbol size for radius of gyration and energy data. 




Figure 3: (a) Coexistence energy distributions p(E* /N)foi the sequences studied, (b) Coexistence temperature T*^^^ plotted against energy 
difference AE/hgT between the coexisting phases. Color denotes sequence in both graphs. 
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Figure 4: Two-particle densities for sequence 4 {T*^,,^ = 0.263). In each case, the top graph shows p(^^ (r*) for temperatures below the 
observed transition, and the bottom graph shows p(^' (r*) for temperatures above the observed transition. Discontinuities are present in the 
data at r* = 0.9 and r* = 1.1 due to the discontinuities in the direct binding potential Eq. and at r* = 1.0 and r* = 1.5 due to the 
discontinuities in the square- well potential Eq. J2j- Curves are color coded according to the temperature at which the data was collected, as 

shown in the legend: (a) p^j^ (r*), (b) p^^j, (r*), (c) pj.^' (r*). 
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(b) 



Figure 5; Typical heteropolymer configurations, generated with gOpenMol f34','35'|. Pale spheres represent hydrophobic monomers, dark 
spheres represent polar monomers, (a) Typical low energy phase configuration, with perspective chosen to clearly show hexagonal organization 
of hydrophobic monomers. In the top picture, display of polar monomers has been suppressed to better show the hydrophobic core, (b) Typical 
disordered globular phase configuration. In the top picture, display of polar monomers has been suppressed to better show the hydrophobic 
core. 
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Figure 6: 

Figurel6l Average contact maps (c(r*)) for the ten sequences studied. Each contact map is shown as a 64 x 64 pixel block, pixels 
corresponding to matrix elements, with the contact persistence (c/j) given by the shade of the pixel as shown in the key. Contact maps are 
symmetric across the diagonal (c = c^), so only the upper half is shown. Contact maps in the same column are from the same sequence. 

Contact maps in the same row were collected at the same temperature. 




Figure 7: Normalized thermal average overlap parameter for the ten mHP sequences studied, calculated from the thermal average contact 
maps shown in Fig|6| Different symbols denote different sequences. Lines serve only as a guide to the eye. 



